Efficient N:M Sparse DNN Training Using Algorithm, Architecture, and Dataflow Co-Design

نویسندگان

چکیده

Sparse training is one of the promising techniques to reduce computational cost DNNs while retaining high accuracy. In particular, N:M fine-grained structured sparsity, where only N out consecutive M elements can be nonzero, has attracted attention due its hardware-friendly pattern and capability achieving a sparse ratio. However, potential accelerate DNN not been fully exploited, there lack efficient hardware supporting training. To tackle these challenges, this paper presents computation-efficient scheme for using algorithm, architecture, dataflow co-design. At algorithm level, bidirectional weight pruning method, dubbed BDWP, proposed leverage sparsity weights during both forward backward passes training, which significantly maintaining model architecture accelerator namely SAT, developed neatly support regular dense operations operations. multiple optimization methods ranging from interleave mapping, pre-generation weights, offline scheduling, are boost efficiency SAT. Finally, effectiveness our evaluated on Xilinx VCU1525 FPGA card various models datasets. Experimental results show SAT with BDWP method under 2:8 ratio achieves an average speedup 1.75x over that accompanied by negligible accuracy loss 0.56% average. Furthermore, improves throughput 2.97~25.22x energy 1.36~3.58x prior FPGA-based accelerators.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Dataflow design of a co-processor architecture for image processing

This paper presents the comparison of two design methodologies applied to the design of a co-processor dedicated to image processing. The first methodology is the classical development based on specifying the architecture by directly writing a HDL model using VHDL or Verilog. The second methodology is based on specifying the architecture by using a high level dataflow language followed then by ...

متن کامل

Balance Principles for Algorithm-Architecture Co-Design

Richard (Rich) Vuduc is an assistant professor in the School of Computational Science and Engineering at Georgia Tech. His research lab, the HPC Garage (hpcgarage.org), is interested in highperformance computing with focus areas in parallel algorithms, performance analysis, tuning, and debugging. He received an NSF CAREER award in 2010 and was an invited member of DARPA's 2009-2010 Computer Sci...

متن کامل

ClosNets: a Priori Sparse Topologies for Faster DNN Training

Fully-connected layers in deep neural networks (DNN) are often the throughput and power bottleneck during training. This is due to their large size and low data reuse. Pruning dense layers can significantly reduce the size of these networks, but this approach can only be applied after training. In this work we propose a novel fullyconnected layer that reduces the memory requirements of DNNs wit...

متن کامل

DNN-Train: Benchmarking and Analyzing DNN Training

We aim to build a new benchmark pool for deep neural network training and to analyze how eicient existing frameworks are in performing this training. We will provide our methodology and develop proper proiling tools to perform this analysis.

متن کامل

Algorithm/Architecture Co-design of Proportionate-type LMS Adaptive Filters for Sparse System Identification

This paper investigates the problem of implementing proportionate-type LMS family of algorithms in hardware for sparse adaptive filtering applications especially the network echo cancelation. We derive a re-formulated proportionate type algorithm through algorithm-architecture co-design methodology that can be pipelined and has an efficient architecture for hardware implementation. We study the...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems

سال: 2023

ISSN: ['1937-4151', '0278-0070']

DOI: https://doi.org/10.1109/tcad.2023.3317789